Forecasting is an important approach to plan the future effectively and efficiently. A time series is a sequence of data points, typically consisting of successive measurements made over a uniform time interval. Time series forecasting is the use of a model to predict future values based on previously observed values.
A leading retailer in USA, wants to forecast sales for their product categories in their store based on the sales history of each category. Sales forecast has very high influence on the performance of the company’s business and hence these sales forecasts can be used to estimate company’s success or performance in the coming year. Accurate forecasts may lead to better decisions in business.
Sales or revenues forecasting is very important for retail operations . Forecasting of retail sales helps retailer to take necessary measures to plan their budgets or investments in a period (monthly, yearly) among different product categories like women clothing, men clothing and other clothing and at the same time they can plan to minimize revenue loss from unavailability of products by investing accordingly.
trainingData <- read.csv('train.csv', header = T)
Missing Data visualization will give us overall picture, whether data is sufficient or not to proceed for further analysis.
Splitting data into 3 product categories, so that each time series can be analyzed.
dfMale <- trainingData[which(trainingData$ProductCategory == "MenClothing"),]
dfMale_REM <- subset(dfMale, select = -c(ProductCategory))
dfFemale <- trainingData[which(trainingData$ProductCategory == "WomenClothing"),]
dfFemale_REM <- subset(dfFemale, select = -c(ProductCategory))
dfOther <- trainingData[which(trainingData$ProductCategory == "OtherClothing"),]
dfOthers_REM <- subset(dfOther, select = -c(ProductCategory))
timeStamp<-as.yearmon(paste(dfOthers_REM$Year, dfOthers_REM$Month), "%Y %m")
TS_dfMale_REM<- ts(dfMale_REM$Sales.In.ThousandDollars.,frequency = 12,start = c(2009,1))
TS_dfFemale_REM<- ts(dfFemale_REM$Sales.In.ThousandDollars.,frequency = 12,start = c(2009,1))
TS_dfOthers_REM<- ts(dfOthers_REM$Sales.In.ThousandDollars.,frequency = 12,start = c(2009,1))
Some of known ways to fill time series: 1)Last Observation Carried Forward- LOCF 2)Next Observation Carried Backward - NOCB 3)Kalman Smoothing 4)Interpolation
Method opted to fill our sales time series is locf (Last Observed Come First). LOCF Description: Replaces each missing value with the most recent present value prior to it
## No trace type specified:
## Based on info supplied, a 'scatter' trace seems appropriate.
## Read more about this trace type -> https://plot.ly/r/reference/#scatter
## No trace type specified:
## Based on info supplied, a 'scatter' trace seems appropriate.
## Read more about this trace type -> https://plot.ly/r/reference/#scatter
## No trace type specified:
## Based on info supplied, a 'scatter' trace seems appropriate.
## Read more about this trace type -> https://plot.ly/r/reference/#scatter
Decomposing Product Category MenClothing for visualizing random, seasonal and trend components of TimeSeries
Sales_train = subset(dfMale_REM, Year !="2015")
Sales_val = subset(dfMale_REM, Year =="2015")
Sales_traints = ts(Sales_train$Sales.In.ThousandDollars., frequency = 12,start=c(2009,1))
Sales_testts = ts(Sales_val$Sales.In.ThousandDollars., frequency = 12, start = c(2015,1))
Sales_HW <- HoltWinters(Sales_traints)
#Train Model MAPE
MAPE_train_HW <- mean(abs(Sales_traints-Sales_HW$fitted[,"xhat"])/abs(Sales_traints))*100
#Validation Forecast Mape
Sales_forecast_HW <- forecast:::forecast.HoltWinters(Sales_HW, h=12)
MAPE_test_HW <- mean(abs(Sales_testts - Sales_forecast_HW$mean)/abs(Sales_testts))*100
Sales_autoArima <- auto.arima(Sales_traints, ic='aic')
#Train Model MAPE
MAPE_train_AA <- mean(abs(Sales_traints-Sales_autoArima$x)/abs(Sales_traints))*100
#Validation Forecast Mape
Sales_forecast_AA <- forecast:::forecast.Arima(Sales_autoArima,h=12)
MAPE_test_AA <- mean(abs(Sales_testts-Sales_forecast_AA$mean)/abs(Sales_testts))*100
Sales_ETS <- ets(model = "ZMZ",Sales_traints)
#Train Model MAPE
MAPE_train_ETS <- mean(abs(Sales_traints-Sales_ETS$fitted)/abs(Sales_traints))*100
#Validation Forecast Mape
Sales_forecast_ETS <- forecast:::forecast.ets(Sales_ETS,h=12)
MAPE_test_ETS <- mean(abs(Sales_testts-Sales_forecast_ETS$mean)/abs(Sales_testts))*100
Sales_taf <- thetaf(Sales_traints)
#Train Model MAPE
MAPE_train_taf <- mean(abs(Sales_traints-Sales_taf$fitted)/abs(Sales_traints))*100
#Validation Forecast Mape
Sales_forecast_taf <- forecast:::forecast.Arima(Sales_autoArima,h=12)
MAPE_test_taf <- mean(abs(Sales_testts-Sales_forecast_taf$mean)/abs(Sales_testts))*100
Sales_xg <- forecastxgb:::xgbts(Sales_traints)
## Warning in forecastxgb:::xgbts(Sales_traints): xgbts is deprecated terminology and will soon be removed.
## Please use xgbar instead.
#Train Model MAPE
MAPE_train_xG <- mean(abs(Sales_traints-Sales_xg$y)/abs(Sales_traints))*100
#Validation Forecast Mape
Sales_forecast_xG <- forecast:::forecast(Sales_xg,h=12)
MAPE_test_xG <- mean(abs(Sales_testts-Sales_forecast_xG$mean)/abs(Sales_testts))*100
Decomposing Product Category WomenClothing for visualizing random, seasonal and trend components of TimeSeries
FE_Sales_train = subset(dfFemale_REM, Year !="2015")
FE_Sales_test = subset(dfFemale_REM, Year =="2015")
FE_Sales_traints = ts(FE_Sales_train$Sales.In.ThousandDollars., frequency = 12,start=c(2009,1))
FE_Sales_testts = ts(FE_Sales_test$Sales.In.ThousandDollars., frequency = 12, start = c(2015,1))
FE_Sales_HW <- HoltWinters(FE_Sales_traints)
#Train Model MAPE
FE_MAPE_train_HW <- mean(abs(FE_Sales_traints-FE_Sales_HW$fitted[,"xhat"])/abs(FE_Sales_traints))*100
#Validation Forecast Mape
FE_Sales_forecast_HW <- forecast:::forecast.HoltWinters(FE_Sales_HW, h=12)
FE_MAPE_test_HW <- mean(abs(FE_Sales_testts - FE_Sales_forecast_HW$mean)/abs(FE_Sales_testts))*100
FE_Sales_autoArima = auto.arima(FE_Sales_traints, ic='aic')
#Train Model MAPE
FE_MAPE_train_AA <- mean(abs(FE_Sales_traints-FE_Sales_autoArima$x)/abs(FE_Sales_traints))*100
#Validation Forecast Mape
FE_Sales_forecast_AA <- forecast:::forecast.Arima(FE_Sales_autoArima,h=12)
FE_MAPE_test_AA <- mean(abs(FE_Sales_testts-FE_Sales_forecast_AA$mean)/abs(FE_Sales_testts))*100
FE_Sales_ETS <- ets(model = "ZMZ",FE_Sales_traints)
#Train Model MAPE
FE_MAPE_train_ETS <- mean(abs(FE_Sales_traints-FE_Sales_ETS$fitted)/abs(FE_Sales_traints))*100
#Validation Forecast Mape
FE_Sales_forecast_ETS <- forecast:::forecast.ets(FE_Sales_ETS,h=12)
FE_MAPE_test_ETS <- mean(abs(FE_Sales_testts-FE_Sales_forecast_ETS$mean)/abs(FE_Sales_testts))*100
FE_Sales_taf <- thetaf(FE_Sales_traints)
#Train Model MAPE
FE_MAPE_train_taf <- mean(abs(FE_Sales_traints-FE_Sales_taf$fitted)/abs(FE_Sales_traints))*100
#Validation Forecast Mape
FE_Sales_forecast_taf <- forecast:::forecast.Arima(FE_Sales_autoArima,h=12)
FE_MAPE_test_taf <- mean(abs(FE_Sales_testts-FE_Sales_forecast_taf$mean)/abs(FE_Sales_testts))*100
FE_Sales_xg <- forecastxgb:::xgbts(FE_Sales_traints)
## Warning in forecastxgb:::xgbts(FE_Sales_traints): xgbts is deprecated terminology and will soon be removed.
## Please use xgbar instead.
#Train Model MAPE
FE_MAPE_train_xG <- mean(abs(FE_Sales_traints-FE_Sales_xg$y)/abs(FE_Sales_traints))*100
#Validation Forecast Mape
FE_Sales_forecast_xG <- forecast:::forecast(FE_Sales_xg,h=12)
FE_MAPE_test_xG <- mean(abs(FE_Sales_testts-FE_Sales_forecast_xG$mean)/abs(FE_Sales_testts))*100
Decomposing Product Category Other Clothing for visualizing random, seasonal and trend components of TimeSeries
OT_Sales_train = subset(dfOthers_REM, Year !="2015")
OT_Sales_test = subset(dfOthers_REM, Year =="2015")
OT_Sales_traints = ts(OT_Sales_train$Sales.In.ThousandDollars., frequency = 12,start=c(2009,1))
OT_Sales_testts = ts(OT_Sales_test$Sales.In.ThousandDollars., frequency = 12, start = c(2015,1))
OT_Sales_HW <- HoltWinters(OT_Sales_traints)
#Train Model MAPE
OT_MAPE_train_HW <- mean(abs(OT_Sales_traints-OT_Sales_HW$fitted[,"xhat"])/abs(OT_Sales_traints))*100
#Validation Forecast Mape
OT_Sales_forecast_HW <- forecast:::forecast.HoltWinters(OT_Sales_HW, h=12)
OT_MAPE_test_HW <- mean(abs(OT_Sales_testts - OT_Sales_forecast_HW$mean)/abs(OT_Sales_testts))*100
OT_Sales_autoArima <- auto.arima(OT_Sales_traints, ic='aic')
#Train Model MAPE
OT_MAPE_train_AA <- mean(abs(OT_Sales_traints-OT_Sales_autoArima$x)/abs(OT_Sales_traints))*100
#Validation Forecast Mape
OT_Sales_forecast_AA <- forecast:::forecast.Arima(OT_Sales_autoArima,h=12)
OT_MAPE_test_AA <- mean(abs(OT_Sales_testts-OT_Sales_forecast_AA$mean)/abs(OT_Sales_testts))*100
OT_Sales_ETS <- ets(model = "ZMZ",OT_Sales_traints)
#Train Model MAPE
OT_MAPE_train_ETS <- mean(abs(OT_Sales_traints-OT_Sales_ETS$fitted)/abs(OT_Sales_traints))*100
#Validation Forecast Mape
OT_Sales_forecast_ETS <- forecast:::forecast.ets(OT_Sales_ETS,h=12)
OT_MAPE_test_ETS <- mean(abs(OT_Sales_testts-OT_Sales_forecast_ETS$mean)/abs(OT_Sales_testts))*100
OT_Sales_taf <- thetaf(OT_Sales_traints)
#Train Model MAPE
OT_MAPE_train_taf <- mean(abs(OT_Sales_traints-OT_Sales_taf$fitted)/abs(OT_Sales_traints))*100
#Validation Forecast Mape
OT_Sales_forecast_taf <- forecast:::forecast.Arima(OT_Sales_autoArima,h=12)
OT_MAPE_test_taf <- mean(abs(OT_Sales_testts-OT_Sales_forecast_taf$mean)/abs(OT_Sales_testts))*100
OT_Sales_xg <- forecastxgb:::xgbts(OT_Sales_traints)
## Warning in forecastxgb:::xgbts(OT_Sales_traints): xgbts is deprecated terminology and will soon be removed.
## Please use xgbar instead.
#Train Model MAPE
OT_MAPE_train_xG <- mean(abs(OT_Sales_traints-Sales_xg$y)/abs(OT_Sales_traints))*100
#Validation Forecast Mape
OT_Sales_forecast_xG <- forecast:::forecast(OT_Sales_xg,h=12)
OT_MAPE_test_xG <- mean(abs(OT_Sales_testts-OT_Sales_forecast_xG$mean)/abs(OT_Sales_testts))*100
men <-list(c(MAPE_train_HW,MAPE_test_HW),
c( MAPE_train_AA,MAPE_test_AA),
c( MAPE_train_ETS,MAPE_test_ETS),
c( MAPE_train_taf,MAPE_test_taf),
c(MAPE_train_xG,MAPE_test_xG))
women <-list(c(FE_MAPE_train_HW,FE_MAPE_test_HW),
c( FE_MAPE_train_AA,FE_MAPE_test_AA),
c( FE_MAPE_train_ETS,FE_MAPE_test_ETS),
c( FE_MAPE_train_taf,FE_MAPE_test_taf),
c(FE_MAPE_train_xG,FE_MAPE_test_xG))
other <-list(c(OT_MAPE_train_HW,OT_MAPE_test_HW),
c( OT_MAPE_train_AA,OT_MAPE_test_AA),
c( OT_MAPE_train_ETS,OT_MAPE_test_ETS),
c( OT_MAPE_train_taf,OT_MAPE_test_taf),
c(OT_MAPE_train_xG,OT_MAPE_test_xG))
df<- as.data.frame(c(men,women,other))
#write.csv(df,"heello.csv")
df<-read.csv("heello.csv")
df
## ProductCategory Model Train Validation
## 1 Men HW 3.208222 6.174506
## 2 Men AA 0.000000 6.280798
## 3 Men ETS 2.948638 6.866537
## 4 Men TAF 11.849201 6.280798
## 5 Men XG 0.000000 7.266726
## 6 Women HW 5.296158 16.438342
## 7 Women AA 0.000000 18.646848
## 8 Women ETS 3.776894 5.927672
## 9 Women TAF 11.838242 18.646848
## 10 Women XG 0.000000 11.014408
## 11 Other HW 3.754543 15.141701
## 12 Other AA 0.000000 5.829530
## 13 Other ETS 3.719118 13.365118
## 14 Other TAF 7.940802 5.829530
## 15 Other XG 39.063186 9.922108
Clearly xgBoost overfits the Training Data for both Men and Women but intrestingly models goes off for Others category. Going by validation result best model for a)Men -> HoltWinter b)Women -> ETS c)Others -> AutoArima